Skip to content

Fix169#241

Merged
mfeurer merged 19 commits intodevelopfrom
fix169
May 4, 2017
Merged

Fix169#241
mfeurer merged 19 commits intodevelopfrom
fix169

Conversation

@janvanrijn
Copy link
Copy Markdown
Member

No description provided.

@codecov-io
Copy link
Copy Markdown

codecov-io commented May 2, 2017

Codecov Report

Merging #241 into develop will increase coverage by 0.46%.
The diff coverage is 94.18%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #241      +/-   ##
===========================================
+ Coverage    88.36%   88.82%   +0.46%     
===========================================
  Files           23       24       +1     
  Lines         1960     2032      +72     
===========================================
+ Hits          1732     1805      +73     
+ Misses         228      227       -1
Impacted Files Coverage Δ
openml/runs/run.py 89.85% <100%> (+3.04%) ⬆️
openml/setups/setup.py 100% <100%> (ø)
openml/runs/__init__.py 100% <100%> (ø) ⬆️
openml/setups/__init__.py 100% <100%> (ø) ⬆️
openml/setups/functions.py 98.46% <100%> (-1.54%) ⬇️
openml/testing.py 98.11% <100%> (+0.03%) ⬆️
openml/runs/functions.py 83% <78.26%> (+0.08%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65e8758...85472cc. Read the comment docs.

Copy link
Copy Markdown
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I only got through half of this PR yet. Will continue tomorrow.

Comment thread openml/runs/functions.py
for param_name in sorted(model_params):
if 'random_state' in param_name:
currentValue = model_params[param_name]
# important to draw the value at this point (and not in the if statement)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, could you explain why? It's not clear to me from this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description.

I'm not sure if we really need this, but seems nice property to respect.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

Comment thread openml/runs/functions.py
return False

def _get_seeded_model(model, seed=None):
'''Sets all the non-seeded components of a model with a seed.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should mention the restriction that one cannot use a random state in the pipelines.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. One could argue that that is a restriction of the run_tasks function, as that is the function the user interacts with. Furthermore, that function is responsible for the check.

This function only adds seeds to unseeded models

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can raise an exception:

import numpy as np
import sklearn.ensemble

import openml


rf = sklearn.ensemble.RandomForestClassifier(
    random_state=np.random.RandomState(1))
openml.runs.functions._get_seeded_model(rf, 5)

But you're right, it should be documented in the run_tasks() function.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mea culpa, i will add it

Comment thread openml/runs/functions.py

return run

def initialize_model_from_run(run_id):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neither used nor tested.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I added tests.

Copy link
Copy Markdown
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly through, I olny need to understand why test_existing_setup_exists has to use a different classifier.

Comment thread openml/setups/setup.py Outdated

Parameters
----------
flow_id : int
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you copied the docstring from the object above and didn't adapt it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I loove copy/pasting. fixed it.

Comment thread openml/runs/run.py
_current['oml:component'] = main_id
else:
raise ValueError("parameter %s not in flow description of flow %s" %(param,flow.name))
_current['oml:component'] = _param_dict[_flow.name]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it once an ID, and once a name?

Comment thread openml/runs/run.py

@staticmethod
def _parse_parameters(model, flow):
def _parse_parameters(model, server_flow):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again I'm actually surprised that this is not called run_task, but only when publishing. But maybe this should be its own issue/PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit? I don really understand which / why

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant "that this is not called IN run_task"

current = openml.setups.get_setup(setups[idx])
assert current.flow_id > 0
if num_params[idx] == 0:
assert current.parameters is None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use self.assert in the unit tests? It gives nicer outputs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean self.asserts()?
Sure.

@janvanrijn
Copy link
Copy Markdown
Member Author

test_existing_setup_exists makes use of sentinels. That gives us a general problem when serializing a flow (or setup) and comparing it to one on the server, as the flow (setup) serialization is not aware of the (and which) sentinel string was used. For that reason, I slightly generalized one of the functions, such that it does not rely on name mappings for the main flow (boolean flag that indicates a function call on depth 1). This way, we do not have to remove the sentinels and we can in this context still test flows without subflows.

assert current.flow_id > 0
if num_params[idx] == 0:
self.asserts(current.parameters is None)
self.assertTrue(current.parameters is None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been more specific. The correct one here would be self.assertIsNone.

self.assertTrue(current.parameters is None)
else:
self.asserts(len(current.parameters) == num_params[idx])
self.assertTrue(len(current.parameters) == num_params[idx])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been more specific. The correct one here would be self.assertEqual.

@mfeurer
Copy link
Copy Markdown
Collaborator

mfeurer commented May 3, 2017

Thanks for the explanation of the sentinels. Maybe it would be good to add that to the actual unit test. What I'm wondering right now is if this is save with respect to running the unit tests in parallel (which happens on travis-ci). You assume:

# although the flow exists, we can be sure there are no
# setups (yet) as it hasn't been ran

it could happen that it was already run. Maybe adding a sentinel to a hyperparameter? Or is there some other way of making a setup unique?

Once we figured out this part, I think we can merge this PR before it becomes too big and create new PRs for the missing functionality.

@janvanrijn
Copy link
Copy Markdown
Member Author

I had already added this in the comment:

# because of the sentinel, we can not use flows that contain subflows

Also, the other comment was slightly confusing. I changed this to:

# although the flow exists (created as of previous statement),
# we can be sure there are no setups (yet) as it was just created
# and hasn't been ran

Should be fine now? Let's merge

@mfeurer mfeurer merged commit 89a3dd4 into develop May 4, 2017
@mfeurer mfeurer deleted the fix169 branch May 4, 2017 14:41
@mfeurer mfeurer mentioned this pull request May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants